Benchmarking Graph Databases with Cyclone Benchmark
نویسنده
چکیده
Recent years have seen advances in graph databases and graph database management systems (GDBMS). In a typical graph data model, a node represents an entity and an edge represents a relationship between two nodes. Nodes and edges typically have associated properties (attributes) and are assigned types for fast query response times. In the application domains where relationships are of importance, GDBMS increasingly gains popularity since the relationships can be explicitly modeled and easily visualized in a graph data model. To measure performance of GDBMS, a number of graph database benchmarks have been proposed. Nonetheless, these benchmarks are not yet as rigorous as those of relational database management systems (RDBMS). Inspired by Wisconsin Benchmark, we propose Cyclone Benchmark for measuring performance of graph databases in several aspects of which some have not been investigated in the literature. Our benchmark comes with (1) two data graph models: a simple model with all nodes of the same node type and a complex model with multiple node types, (2) data graph generation programs, and (3) Create, Read, Update, and Delete (CRUD) operations. The data graph generation programs create a graph structure and annotate values of node and edge attributes in the graph to allow for a study of the impact of attribute selectivity factors as well as a correlation between attributes. The programs generate a predefined graph structure, a random graph, or a Kronecker graph that has been shown to model real-world networks well. The read operations include several graph structure queries. We measured the average execution times of the proposed CRUD operations on several synthetic graphs generated by the benchmark with a varying number of nodes from 1,000 to 1,000,000 nodes. The graphs were stored as graphs in Neo4j, a popular native GDBMS, and as relations in MySQL, a popular RDBMS. For most CRUD operations including the graph structure queries in our benchmark, MySQL was significantly faster than Neo4j.
منابع مشابه
Analysis and Experimental Comparison of Graph Databases
In the recent years a new type of NoSQL databases, called Graph databases (GDBs), has gained significant popularity due to the increasing need of processing and storing data in the form of a graph. The objective of this thesis is a research on possibilities and limitations of GDBs and conducting an experimental comparison of selected GDB implementations. For this purpose the requirements of a u...
متن کاملCyclone: java-based querying and computing with Pathway/Genome databases
UNLABELLED Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibili...
متن کاملBenchmarking Graph Databases on the Problem of Community Detection
Thanks to the proliferation of Online Social Networks (OSNs) and Linked Data, graph data have been constantly increasing, reaching massive scales and complexity. Thus, tools to store and manage such data efficiently are absolutely essential. To address this problem, various technologies have been employed, such as relational, object and graph databases. In this paper we present a benchmark that...
متن کاملFoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics
We present FoodBroker, a new data generator for benchmarking graph-based business intelligence systems and approaches. It covers two realistic business processes and their involved master and transactional data objects. The interactions are correlated in controlled ways to enable non-uniform distributions for data and relationships. For benchmarking data integration, the generated data is store...
متن کاملA Review of Benchmarking Content Based Image Retrieval
Benchmarking Content Based Image Retrieval (CBIR) systems allows researchers and developers to compare the strengths of different approaches, and is an essential step towards establishing the credibility of CBIR systems for commercial applications. Here we introduce the problem of developing a benchmark, discuss some of the issues involved, and provide a review of current and recent benchmarkin...
متن کامل